F3. Continuous random variables

Author

Ullrika Sahlin

(cont F2) Variance

Let \(\mu\) be the expected value for the random variable \(X\)

The variance describes the spread around the expected value. More specifically is the variance the expected value of the quadratic distance to the expected value of \(X\).

\(V(X) = E((X-E(X))^2)\)

or

\(V(X) = E((X-\mu)^2)\)

Note

The reason for squaring the distances is that there will be both negative and positive distances, and they can ‘cancel each other out’ if summed directly.

The deviation is \(\sqrt{V(X)}\) and is a measure of spread on the same scale as the random variable \(X\).

(cont F2) Variance for a discrete random variable

\(X\) is a discrete random variable

\(V(X)=\sum_{\text{all x}} (x-\mu)^2P(X=x)\)

Example: Dice role

Let \(X= \text{"number of dots"}\)

We have earlier derived that \(E(X)=3.5\)

\[\begin{split} & V(X)=\sum_{x=1}^6(x-3.5)^2\cdot \frac{1}{6} = \\ & \frac{1}{6} ((1-3.5)^2+(2-3.5)^2+(3-3.5)^2+(4-3.5)^2+(5-3.5)^2+(6-3.5)^2) = \\ & \frac{1}{6} ((-2.5)^2 + (-1.5)^2+ (-0.5)^2+ (0.5)^2+ (1.5)^2+ (2.5)^2) = \frac{17.5}{6} \end{split}\]

(cont F2) Exam questions on the expected value

On a specific road the random variable \(X = \text{number of accidents during a week}\) has the following distribution

Outcome (x) 0 1 2 3
\(P(X = x)\) 0.70 0.20 0.06 0.04

Calculate the expected number of accidents during a week

\[\begin{split} & E(X) = \sum_{x=0}^3 x\cdot P(X=x) = \\ & 0\cdot 0.70 + 1 \cdot 0.20 + 2\cdot 0.06+ 3\cdot 0.04 = 0.44 \end{split}\]

\(\therefore E(X)=\mu = 0.4\)

The symbolen \(\therefore\) means therefore is my conclusion that

What is the variance? (this was not on the exam)

\[\begin{split} & V(X)=\sum_{x=0}^3 (x-\mu)^2\cdot P(X=x) =\sum_{x=0}^3 (x-0.44)^2\cdot P(X=x)=\\ & (0-0.44)^2\cdot 0.70 + (1-0.44)^2 \cdot 0.20 + (2-0.44)^2\cdot 0.06+ (3-0.44)^2\cdot 0.04 =\\ & 0.6064 \end{split}\]

(cont F2) Probability and distributionfunctions discrete r.v.

Pair the following \(f(x)\) with \(F(X)\)

Example. Poisson distribution

\(X=\text{"number of spams per hour"}\)

Model: \(X \sim Po(\lambda_X)\) where \(\lambda_X=0.5\)

What is the probability of receiving at least 6 spams in one day?

The first thing we need to do is convert the model’s intensity parameter so that it gives the number over the correct time unit. From per hour to per day.

1 spam per day corresponds to 24 spams per hour.

\(Y=\text{"number of spams per day (24 hours)"}\)

Model: \(Y \sim Po(\lambda_Y)\) where \(\lambda_Y=24\cdot 0.5 = 12\)

\[\begin{split} & P(Y\geq 6) = P(Y\geq 6) = \\ & 1-P(Y\leq 5) = 1 - F_Y(5) = \\ & 1- 0.0203 = 0.9797 \end{split}\]

Alternatively one can calculate the probability directly from the probability functions

\[\begin{split} F_Y(5) = & P(Y=0)+P(Y=1)+ \dots P(Y=5) =\\ & \frac{2^0e^{-2}}{0!} + \frac{2^1e^{-2}}{1!} +\dots +\frac{2^5e^{-2}}{5!} \end{split}\]

(cont F2) Expected value for a Poisson distribution

Difficult! Not included in the material for the course, but useful to have heard about.

\(X \sim Po(\lambda)\) and \(f(x) = \frac{\lambda^xe^{-\lambda}}{x!}\)

\[\begin{split} & E(X) = \sum_{x=0}^{\infty}x\cdot f(x) = \sum_{x=1}^{\infty}x\cdot f(x) = \\ & \sum_{x=1}^{\infty}x\cdot \frac{\lambda^xe^{-\lambda}}{x!} = \sum_{x=1}^{\infty}\frac{\lambda^xe^{-\lambda}}{(x-1)!} = \\& e^{-\lambda} \sum_{x=1}^{\infty}\frac{\lambda^x}{(x-1)!} = \\ & \lambda \cdot e^{-\lambda} \sum_{x=1}^{\infty}\frac{\lambda^{x-1}}{(x-1)!} = \\ & \lambda \cdot e^{-\lambda} \sum_{x=0}^{\infty}\frac{\lambda^x}{x!} = \lambda\cdot e^{-\lambda}\cdot e^{\lambda} = \lambda \end{split}\]

In the second but last step we used the mathematical result that \(\sum_{x=0}^{\infty}\frac{\lambda^x}{x!} = e^{\lambda}\)

Continuous random variables

  • A continuous random variable \(X\) takes an infinite amount of values. This means that

\[P(X =x) = 0\]

  • Instead, we study probability for intervals, e.g. the interval \([a,b]\):

\[P(a \leq X \leq b)\]

  • The distribution of a continuous r.v. \(X\) can be described by a density function (probability density function, PDF)

\[f_X(x) \geq 0\]

Density function for a continuous r.v.

Example. Uniform distribution

\[f(x) = \left\{ \begin{array}{lr} \frac{1}{b-a} & a \leq x \leq b\\ 0 & \text{otherwise} \end{array}\right.\]

A uniform distribution is suitable for a r.v. taking values in an interval with equal probability.

Example. Exponential distribution

\[f(x) = \left\{ \begin{array}{lr} \lambda\cdot e^{-\lambda x} & x \geq 0\\ 0 & \text{otherwise} \end{array}\right.\]

The exponential distribution takes non-negative values \(x \geq 0\).

It is a suitable distribution for describing the time it takes for an event to occur, such as waiting time for a bus or getting an appointment with the doctor.

Example. Normal distribution

\(f(x) = \frac{1}{\sigma \sqrt{2\pi}}e^{-\frac{(x-\mu)^2}{2\sigma^2}}\) där \(-\infty < x < \infty\)

a normal distribution has two parameters, \(\mu\) and \(\sigma^2\). The coincide with the expeted value and variance of the distribution

Distribution function for a continuous r.v.

  • Probability corresponds to an area under the density function

  • The distribution function is the area up to the outcome \(x\)

\[F(x)=\int_{-\infty}^{x} f(v)dv\]

  • The total area under the density function is always 1

\[\int_{-\infty}^{\infty} f(x)dx\]

  • \(P(X < x) = P(X\leq x)\) for continuous r.v. (not for discrete r.v.)
Example. Uniform distribution

The random variable \(X\) is uniformly distributed in the interval 0 to 10.

We know that the density function is \[f(x) = \left\{ \begin{array}{lr} \frac{1}{10} & 0 \leq x \leq 10\\ 0 & \text{otherwise} \end{array}\right.\]

What is the probability that \(X\) is less than or equal to 7?

\[\begin{split} & P(X \leq 7) = F(7) = \int_{-\infty}^7 f(x)dx = \\ & \int_0^7 \frac{1}{10}dx = [\frac{x}{10}]_{x=0}^{7} = \\ & \frac{7}{10} - \frac{0}{10} = \frac{7}{10} \end{split}\]

Example. Exponential distribution

The random variable \(X\) is exponentially distributed with the parameter \(\lambda = \frac{3}{2}\)

We know that \[f(x) = \left\{ \begin{array}{lr} \lambda\cdot e^{-\lambda x} & x \geq 0\\ 0 & \text{otherwise} \end{array}\right.\]

What is the probability that \(X\) is less or equal to 2?

\[\begin{split} & P(X \leq 2) = F(2) = \int_{-\infty}^2 f(x)dx = \\ & \int_{-\infty}^2 \lambda\cdot e^{-\lambda x}dx =\int_{0}^2 \frac{3}{2}\cdot e^{-\frac{3}{2} x}dx = \\ & [-e^{-\frac{3}{2} x}]_{x=0}^{2} = -e^{-\frac{3}{2}\cdot 2} - -e^{-\frac{3}{2} \cdot 0} = \\ & -e^{-3} + 1 = 1 - e^{-3} \end{split}\]

Distribution function for an exponential distribution

The distribution function for an exponential distribution is

\[F(x) = 1 - e^{-\lambda x}\]

Complementary event for a continuous r.v.

\(P(X \geq x) = 1 - P(X < x) \underbrace{ =}_{P(X=x)=0} 1 - P(X \leq x)\)

Probability over an interval

\(P(a < X \leq b) = P(X \leq b) - P(X \leq a)\)

Example: Interval

\(P(-2 < X \leq 1)\)

Expeted value and variance for a continuous r.v.

\(X\) is a continuous random variable

\(\mu= E(X) = \int_{-\infty}^{\infty} xf(x)dx\)

\(\sigma^2 = V(X) = \int_{-\infty}^{\infty} (x-\mu)^2f(x)dx\)

Example: Exponential distribution

\(X \sim Exp(\lambda)\)

\(E(X) = \frac{1}{\lambda}\)

\(V(X) = \frac{1}{\lambda^2}\)

Exponential distribution on wiki

Discrete and continuous r.v.

Normal distribution

  • The normal distribution is useful and often appear when describing natural phenomena

  • The normal distribution is a good description of random variation for sums of independent and equally distributed random variables

  • We will spend a lot of time on the normal distribution in this course

  • There is a trick to get the value on the distribution function for any parameter values

Density function for a normal distribution

\(X \sim N(\mu,\sigma)\)

standard deviation or variance

Some text books and software use variance in the formula for the normal distribution \(N(\mu,\sigma^2)\)

  • The density function for a normal distribution looks like a church bell

  • The normal distribution is symmetrical

\(F(x) = 1 - F(-x)\)

  • Mode, median and expeted value coincide fora normal distribution

Distribution function for a normal distribution

\(X \sim N(\mu,\sigma)\)

\[\begin{split} P(X \leq 0.1) & = F(0.1) = \int_{-\infty}^{0.1}f(x)dx = \\ & \int_{-\infty}^{0.1}\frac{1}{\sigma \sqrt{2\pi}}e^{-\frac{(x-\mu)^2}{2\sigma^2}}dx \end{split}\]

Let us assume that \(\mu=0\) and \(\sigma=1\)

\[=\int_{-\infty}^{0.1}\frac{1}{\sqrt{2\pi}}e^{-x^2}dx = \dots\text{is possible to solve but difficult}\]

Distribution function for a normal distribution - table

Instead of calcluating the integral we can use

  • tables
  • calculators/computer programs

If we only have a table - how to do for all possible values on the expted value \(\mu\) and the variance \(\sigma^2\)?

The solution is to standardise the distribution

Standardised Normal distribution

\(X \sim N(3,4)\)

Create a new r.v. \(Z = \frac{X-3}{4}\)

One can show that \(Z \sim N(0,1)\) which is a standardised normal distribution.

The following holds \(X = 3 + 4\cdot Z\)

The distribution function for a standardised normal distribution is denoted \(\Phi(x)\) and has a table

\[\begin{split} & P(X \leq 4) = P(\frac{X-3}{4} \leq \frac{4-3}{4}) = \\ & P(Z \leq 0.25) = \Phi(0.25) \underbrace{= 0.5987}_{\text{from table}} \end{split}\]

Standardised normal distribution and normal distribution

Let \(Z \sim N(0,1)\)

Then \(X = \mu + \sigma \cdot Z\) is also normally distributed with expected value \(\mu\) and variance \(\sigma^2\), i.e.
\[X \sim N(\mu,\sigma)\]

Example. Normal distribution

Let \(X \sim N(5,2)\)

\[\begin{split} & P(X \geq 0) = 1 - P(X < 0) = 1 - P(X \leq 0) = \\ & 1 - P(\frac{X-5}{2} \leq \frac{0-5}{2}) = 1 - \Phi(\frac{-5}{2}) = \\ & 1 - (1-\Phi(\frac{5}{2})) \end{split}\]

Exam question

The weight of a skier with equipment is modelled by a normal distribution with expected value 80 kg and varians 36 kg^2. The skiier Kim is alone in the lift. What is the probability that his weight exceeds 90 kg?

Let \(X = \text{"weight in kg"}\)

Model: \(X \sim N(80,6)\)

\[\begin{split} & P(X > 90) = 1 - P(X \leq 90) = \\ & 1 - P(\frac{X-80}{6} \leq \frac{90-80}{6}) = 1 - \Phi(\frac{10}{6}) \end{split}\]

Quantile

A quantile divides a probability distribution into two parts.

\[P(X \leq x_{.98}) = 0.98\]

or

\[P(X > \lambda_{.02}) = 0.02\]

Examples of quantiles

  • Median – the quantile that divides the distribution into two halfs with 50% probability each

  • Quartiles – the quantiles that split a distribution into four parts with equal probability:

    • First quartile (Q1)
    • Second quartile = Median
    • Third quartile (Q3)
  • Percentile – the p:th percentile is the value of a random variable that is higher than p% of all possible values

Quantiles illustrated with a distribution function

Quantiles illustrated with a density function

Quantiles illustrated with a boxplot

Quantiles of the normal distribution

We will use quantiles from a standardised normal distribution to create statistical tests and confidence intervals

The table sheet contains some commonly used quantiles

Extra examples

Let \(X \sim N(5,2)\)

  1. \(P(X \leq 6) = P(\frac{X-5}{2} \leq \frac{6-5}{2}) = \Phi(\frac{1}{2})\)

\[\begin{split} & P(1.8 < X < 7.2) = P(X < 7.2) - P(X \leq 1.8) = \\ & \Phi(\frac{7.2-5}{2})-\Phi(\frac{1.8-5}{2}) = \Phi(1.1) - \Phi(-1.6) = \\ & \Phi(1.1) - (1 - \Phi(1.6)) = 0.864 - (1 - 0.945) = 0.810 \end{split}\]

  1. Find \(a\) such that \(P(X \leq a) = 0.05\)

Let \(Z\) be the standarised normal distribution \(Z \sim N(0,1)\)

We know the following: \(P(X \leq a) = P(Z \leq \frac{a-5}{2})\)

If we can find the quantile for \(Z\), then we can derive the quantile for \(X\)

From the quantile table we see that \(P(Z \leq z_{.05}) = 0.05\) when \(z_{.05} = -1.645\)

use that \(\lambda_{1-\alpha} = -\lambda_{\alpha}\)

Then \(x_{.05} = 5 + 2 \cdot z_{.05} = 5 + 2 \cdot (-1.645) = 1.71\)